Stackdriver Workspaces (Cloud Operations)
In this lesson, we will discuss provisioning Stackdriver workspaces.
We'll cover the following
This lesson is an introductory lesson for Stackdriver. We have some in-depth lessons for Stackdriver in upcoming sections.
Google is known for acquiring businesses and integrating their services with their services. Stackdriver is one of these. It started in 2012-13, and Google acquired Stackdirver in 2014.
After 2020, Stackdriver was renamed to Cloud Operations. As this is an introductory lesson, we will cover the uses of Stackdriver and all its offerings.
So, let’s explore Stackdriver.
Monitoring#
The first part of cloud operations is Monitoring to monitor resources from one or more projects. A workspace is a tool for monitoring resources in one or more Google Cloud projects or AWS accounts. Yes, Stackdirver can also monitor AWS accounts using an AWS connector. Let’s go through the sub menus of Monitoring.
To Open, go to Main menu > Operations > Monitoring
Workspaces#
The monitoring workspace is a one-stop solution to all monitoring requirements. A workspace, by default, monitors the host project (known as a scoping project), and the workspace’s name is automatically set to the host project.
There are two types of projects in the workspace.
-
Host project/Scoping Project: This stores all the Stackdirver Workspace’s metadata.
-
Monitored project: One or more workspaces can monitor a Google Cloud project or AWS account. A workspace constantly monitors its Google Cloud host project. However, you can configure a workspace to monitor up to one hundred Google Cloud projects and AWS accounts alongside the host project.
Even though every project can have its monitoring workspace, having a separate monitoring workspace project makes it easy to have the entire monitoring workload in one place.
Dashboard#
The dashboard is a combination of charts that shows metrics based on resource type and filters you choose. There are mainly four types of dashboards in GCP.
-
Created by GCP: Google cloud creates default dashboards for the services you have made. The type of these dashboards is Google Cloud Platform. Any user cannot delete these dashboards.
-
Custom: Dashboard created by a user. If default dashboards do not serve the purpose, we can create custom dashboards.
-
Application: When third-party services are installed on GCP resources, this dashboard provides detailed information about that service.
-
Amazon Web Services: This dashboard monitors products provided by Amazon.
You can create different dashboards to monitor other services. Currently, there will be one monitoring dashboard for Cloud Storage; this will show stats about the buckets we created earlier. You can monitor all the resource-specific attributes like requests, network traffic, number of objects, etc.
For example, you want to monitor the CPU utilization of all the compute instances using a dashboard.
Then, create a dashboard with a specific name denoting its purpose. Click “add charts,” then select the resource types and other filters based on your requirements.
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
Services#
This is more of the architect-level section, but services monitor a particular service for “Service Level Objective.” SLO is the business term that means how much time we want to provide uninterrupted service for our customers. For example, we have committed a 99.5% uptime for a particular software. Then, out of 720 hours in a month, our service should be up for a minimum of 716.4 hours. The only requirement is that the user-defined service is deployed using the Google Kubernetes Engine. Services help us to achieve business-level SLOs.
Metrics explorer#
Metrics Explorer is a playground for analyzing and creating charts for different metrics. You can select any metrics related to the resource type and add them to any dashboard as a chart.
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
Alerting#
As the name says, this sends an alert to the configured notification channels based on a policy.
A policy tracks a set of conditions that can be defined using either an uptime check or some metrics. As an industry standard, it is a good practice to document the possible steps to fix the alert/issue.
When that condition is met, the Alert service will send messages to all configured notification channels alerting the user about the event or condition being triggered.
-
Click on “Alerting.” You will get a form to select a condition for triggering the alert.
-
You can create alerts based on specific metrics, uptime checks, or the status of a process.
-
Alerting service has all the possible notification channels. Let’s configure an email notification channel. We will use this configured email in the uptime check section.
1 of 2
2 of 2
Uptime checks#
Uptime checks monitor a particular service, app, or resource. When the said service goes down, an incident is created, and based on that incident; the alert is sent to the configured notifications channels.
Uptime checks are also used to create alerting policies. In the KPI measurements of service, uptime checks play an important role.
Uptime checks can be configured using three protocols. Those are:
-
HTTP: Any HTTP endpoint hosted on public IP address and reachable via a standard HTTP request can be tracked for its uptime.
-
HTTPS: Any HTTPS websites hosted publicly or reachable via any browser can be tracked. You can follow the uptime of https://google.com as well. You can always track your website uptime using this one.
-
TCP: Any other TCP-based application with a reachable port. These applications are tracked by periodically establishing the connection using the port. If there is a failure in establishing the link, the alert is triggered.
Uptime checks also suggest any possible alerts that need to be configured. Uptime check supports Email, SMS, Slack (Beta), Pagerduty (Beta), Cloud PubSub (Beta), and Webhooks (Beta) notification channels. You configure any of them.
Whenever we create an uptime check policy, the Alerting service automatically creates one failure policy. The failure policy’s success depends on the failure of the uptime check policy. If the uptime check fails, that is a success for failure policy, and then alerting service will send the notification to the configured notification channel.
As we have not created any service for ourselves, let’s track the uptime of https://google.com for demo purposes.
1 of 8
2 of 8
3 of 8
4 of 8
5 of 8
6 of 8
7 of 8
8 of 8
Groups#
Within a workspace, you can use Groups if you have to group only particular resources and monitor them separately.
For example, a service named “Image uploader” uses a cloud function, a bucket, and a PubSub topic; you can group and monitor all these resources simultaneously.
Groups also provide different filters to choose specific resources to meet the requirement.
Settings#
It is used to carry out admin tasks such as configuring AWS accounts, adding other projects to the workspace, removing or moving projects from the current workspace, or merging other workspaces. Also, documentation on how to set up agents for logging and monitoring different resources. It is an admin panel for Monitoring.
-
Since a workspace can monitor more than one GCP or AWS project, you can add a project to the existing workspace by clicking the ADD GCP PROJECTS or ADD AWS ACCOUNT buttons.
-
Since one project can be part of multiple workspaces, we can merge other workspaces into the current one. But remember, connecting different workspaces to the current workspace will delete all configs. To join the additional workspace, click on the MERGE button.
Overview#
Finally, we have the overview tab. The overview tab provides a look at all the tabs at a glance. This monitoring dashboard displays all the policies, uptime checks, alerts, and groups. As the name suggests, this is an overview of all other tabs.
The monitoring part of cloud operations is very useful when there is a production workload. Different services are used to monitor production applications. Monitoring is a GCP-specific service to get the stats about your GCP workloads automatically. Of all of the above services, the alerting service is the most important service because it alerts users in case of any unplanned downtime on the application or application failure due to any issues. This has been a brief overview of the Cloud Operations’ monitoring service.
Enabling APIs
Quiz